DESIGN OF FAST AND EFFICIENT HYBRID-FPGAs FOR NUMERICALLY INTENSIVE APPLICATIONS IN FLUID DYNAMICS AND IMAGE/VIDEO PROCESSING
نویسندگان
چکیده
Numerical simulations in Computational Fluid Dynamics, involves modules solving sets of equations (2nd-order Crank-Nicolson/Adams-Bashforth, 3rd-order Runge-Kutta time-stepping etc.) which have some common computation features, and performed iteratively. Similarly video and image processing applications involve tasks (mosaic building to compress video into images, image compression such as DCT, DWT etc.) which also have some common computation features and require iterative processing. Both these domains are known to have a high degree of relevance to military applications. It has been shown by several researchers that these applications are well suited to be executed on spatially parallel processor architectures. FPGAs in particular offer large amounts of on-chip spatial parallel units, thus capable of performing orders of magnitude faster than regular serial processors. But FPGAs suffer from the drawbacks of being application agnostic and hence incur penalties of loss of clock cycles in redundant reconfigurations, generic routing and poor memory architectures which impact speed, power and silicon area. All these factors have led us into exploring the reconfigurable architecture design space with the application domain being prioritized. This paper focuses on the extraction of tasks or core clusters in Control Data Flow Graphs (CDFGs) of multimedia and fluid dynamics applications followed by designing the architecture to embed them in Hybrid-FPGA environments. By Hybrid, we mean that the proposed FPGA architectures will involve LUT regions, ASIC regions and possibly VPGA regions. Tasks or core clusters obtained through the common sub-graph analysis between basic blocks within and across routines are basically recurring computation patterns implemented as ASICs on non-reconfigurable area. After removing the common sub-graphs from the CDFG, remaining parts of each basic block are then implemented on LUT based reconfigurable area. A Packing mechanism designed to improve the routing architecture by reducing the switching requirements by 12-20% has been proposed. This mechanism, for configurable logic uses live-in live-out variable analysis and scheduling information of CDFGs in its cost function in addition to routability and timing driven cost metrics defined by other researchers. We have conducted experiments on several complex routines from the target applications. Map/synthesis reports based on Xilinx architectures were obtained. Results show that partial reconfiguration with the use of computation cores embedded in a sea of LUTs offer the potential for massive savings in gate density. In addition to that, by eliminating the need for unnecessary and redundant sub-circuit pattern configurations, switching requirements in configurable area is reduced due to localization of global …
منابع مشابه
A Stochastic Image Grammar for Fine-Grained 3D Scene Reconstruction
This paper presents a stochastic grammar for finegrained 3D scene reconstruction from a single image. At the heart of our approach is a small number of grammar rules that can describe the most common geometric structures, e.g., two straights lines being co-linear or orthogonal, or that a line lying on a planar region etc. With these grammar rules, we re-frame single-view 3D reconstruction probl...
متن کاملFast and Robust Generation of City Scale Urban Ground Plan
Since the introduction of the concept of Digital Earth, almost every major international city has been re-constructed in the virtual world. A large volume of geometric models describing urban objects has become freely available in public domain via software like Google Earth. Although mostly created for visualization, these urban models can benefit many applications beyond visualization includi...
متن کاملEvenly Spaced Streamlines for Surfaces: An Image-Based Approach
We introduce a novel, automatic streamline seeding algorithm for vector fields defined on surfaces in 3D space. The algorithm generates evenly spaced streamlines fast, simply and efficiently for any general surface-based vector field. It is general because it handles large, complex, unstructured, adaptive resolution grids with holes and discontinuities, does not require a parametrization, and c...
متن کاملOrder-p Tensors: Factoring and Applications
Saturday 10:15 – 12:15 Carol Woodward, LLNL Victoria Howle, Texas Tech Misha Kilmer, Tufts University Carla D. Martin, James Madison University Saturday 3:15 – 5:15 Jingmei Qiu, Colorado School of Mines Fengyan Li, Rensselaer Polytechnic Institute Lorena Barba, Boston University Sarah Olson, WPI (as of Fall 2011) Saturday 10:15 – 12:15 Accelerated Fixed Point Methods for Subsurface Flow Problem...
متن کاملReceiver-based congestion control mechanism for Internet video transmission
Efficient transmission of delay-stringent video over the Internet is examined in this work. In our design, an end user adjusts the network load depending on the perceived network status to satisfy the available bandwidth and the individual loss rate requirement while exhibiting network friendly behavior. In particular, we present a model-free, receiver-based congestion control mechanism that by...
متن کامل